feat: Add Ollama integration for local LLM support in Cortex Linux #361

sujay-d07 · 2025-12-24T18:44:03Z

Related Issue

Closes #357

Summary

This pull request introduces major improvements to Cortex's LLM integration, making local, privacy-first inference via Ollama the new default. It adds a robust Ollama setup flow, updates documentation and installation instructions, and refines model selection and LLM routing logic. The changes ensure users can use Cortex completely offline, with cloud fallback as an option, and no API keys required for basic usage. The most important changes are grouped below:

Ollama Integration and Local LLM Support:

Added a new post-install hook (cortex/_post_install.py) that automatically runs Ollama setup after installation, including prompts for model selection and configuration.
Introduced the OllamaProvider class and integrated it throughout the codebase, making local LLM inference via Ollama the default for all tasks. [1] [2] Ff49def5L25, Ff49def5L48)
Updated the first-run wizard and model recommendation logic to default to codellama:7b and use new model naming conventions (e.g., phi3:mini, llama3:8b). [1] [2] [3]
Enhanced the LLM router to prioritize Ollama (local) first, followed by cloud providers (Claude, Kimi K2, OpenAI), and updated provider enums and documentation. [1] [2] [3] [4]

Documentation and User Experience:

Overhauled installation and usage instructions in README.md and AGENTS.md to emphasize privacy-first local LLMs, offline capability, and optional API keys; added detailed instructions for managing Ollama models and setup. [1] [2] [3] [4] [5] [6] [7] [8]
Added comprehensive changelog entries and updated manifest to include new scripts and documentation. [1] [2]

Robustness and Error Handling:

Improved parsing of LLM responses to handle markdown code blocks and partial JSON, with better error logging for debugging. [1] [2]

Important Commands:

Install (Interactive Ollama Setup)
python setup.py develop

Full Setup (From Scratch)

git clone https://github.com/cortexlinux/cortex.git
cd cortex
python3 -m venv venv
source venv/bin/activate
pip install --upgrade pip setuptools wheel
python setup.py develop

Verify Installation

cortex --help
cortex install nginx --dry-run

Install Ollama Later (If Skipped)

python setup.py develop

Testing

# All tests
pytest tests/ -v

Ollama-specific tests
pytest tests/test_ollama_integration.py tests/test_ollama_setup_integration.py -v

Migration (Old Install → New)

pip uninstall cortex-linux
python setup.py develop

Summary by CodeRabbit

New Features
- Local LLM support via Ollama (privacy-first, offline-capable, zero-cost) with interactive setup and a new setup command and demo.
Changed
- Routing now prefers local Ollama with automatic cloud fallback; API keys optional for cloud usage.
- Default model recommendations and first-run flow updated for local-first operation; Python 3.10+ required.
Documentation
- New Ollama quickstart, integration, auto-setup, and usage guides; README updated with privacy/offline messaging.
Tests
- New integration tests and examples covering Ollama workflows and routing behavior.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2025-12-24T18:44:12Z

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

📝 Walkthrough

Walkthrough

Adds a local-first Ollama provider and integration: new OllamaProvider, LLMRouter changes to prefer Ollama with cloud fallbacks, provider/package wiring and post-install auto-setup, interpreter/default model updates, docs/examples/config, tests, and test output showing several Ollama connectivity/routing failures.

Changes

Cohort / File(s)	Summary
Core Ollama Provider Implementation `cortex/providers/ollama_provider.py`, `cortex/providers/__init__.py`	New `OllamaProvider` with install/start checks, model discovery/selection/pull, ensure_model_available, sync & streaming completion API, model introspection, and package re-export.
LLM Routing & Provider Integration `cortex/llm_router.py`	Adds `LLMProvider.OLLAMA`, defaults many routes to Ollama, `prefer_local` ctor param, availability/model checks, `_complete_ollama()` path, fallback to cloud providers, cost/stats updates, and attempted-provider tracking.
Interpreter & First-run Defaults `cortex/llm/interpreter.py`, `cortex/first_run_wizard.py`	Default Ollama model set to `codellama:7b`; `_call_ollama()` refactored to use `OllamaProvider`; improved JSON/command extraction; first-run messaging and model-pull command updated.
Hardware & Model Recommendations `cortex/kernel_features/hardware_detect.py`	Updated recommended models for >=8GB to Ollama-style identifiers (`phi3:mini`, `llama3:8b`, `gemma:2b`, `qwen2.5:7b`).
Post-install Setup Automation & CLI `setup.py`, `cortex/_post_install.py`, `scripts/setup_ollama.py`, `scripts/__init__.py`	Post-install/develop/egg_info hooks invoke `setup_ollama()`, new console entry `cortex-setup-ollama`, auto-install/start/pull flow with CI/non-interactive guards and marker file; setup wiring and packaging updates.
Examples, Config & Scripts `examples/ollama_demo.py`, `examples/sample-config.yaml`, `scripts/verify_ollama_setup.sh`	Demo showing Ollama status, local completion, router fallback; sample config adds `llm` block for Ollama and cloud fallbacks; verification script for auto-setup.
Tests & Test Output `tests/test_ollama_integration.py`, `tests/test_llm_router.py`, `tests/test_cli.py`, `tests/test_ollama_setup_integration.py`, `tests/integration/*`, `test_output.txt`	New Ollama integration tests and updated router/CLI tests reflecting Ollama-first routing and fallbacks; test run log shows multiple failures related to Ollama connectivity and routing expectations.
Docs & Packaging `CHANGELOG.md`, `README.md`, `docs/*`, `AGENTS.md`, `MANIFEST.in`, `pyproject.toml`, `setup.py`	Extensive docs added/updated for Ollama integration and auto-setup; manifest includes `scripts/`; setup adds post-install hooks and `cortex-setup-ollama` entry point; license metadata formatting adjusted.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant LLMRouter
    participant OllamaProvider
    participant Local_Ollama
    participant CloudProvider

    User->>LLMRouter: complete(task, messages)
    activate LLMRouter

    rect rgb(232,245,233)
    Note over LLMRouter,OllamaProvider: Local-first path — ensure service & model
    LLMRouter->>OllamaProvider: ensure_model_available()
    activate OllamaProvider
    OllamaProvider->>Local_Ollama: is_running() / start_service()
    Local_Ollama-->>OllamaProvider: ready
    OllamaProvider->>Local_Ollama: /api/generate (prompt)
    Local_Ollama-->>OllamaProvider: streamed response
    OllamaProvider-->>LLMRouter: LLMResponse (Ollama)
    deactivate OllamaProvider
    end

    rect rgb(255,248,225)
    Note over LLMRouter,CloudProvider: Fallback if Ollama unavailable or errors
    alt Ollama unavailable or error
        LLMRouter->>CloudProvider: complete() via Claude/Kimi/OpenAI
        CloudProvider-->>LLMRouter: completion
    end
    end

    LLMRouter->>User: return LLMResponse
    deactivate LLMRouter

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Possibly related PRs

[#273] Add Python 3.14 free-threading compatibility 2.0 #346 — touches LLMRouter internals (stats/thread-safety) that intersect with provider stats and _update_stats usage.
LLM Router - Multi-Provider Support (Issue #34) #41 — earlier LLM routing/provider work that overlaps enum/routing changes introduced here.
Added Kimi K2 provider tests Fixes #40 #192 — related LLM/interpreter edits and tests that overlap provider integration and routing tests.

Suggested labels

enhancement

Suggested reviewers

mikejmorgan-ai
Suyashd999

Poem

🐰 I hopped through code and pulled a friend,

Ollama local, no API to lend.
Models snug in my burrowed den,
Streaming answers, safe for when.
Carrots, code, and quiet zen.

Pre-merge checks and finishing touches

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The PR title clearly and concisely summarizes the main change: adding Ollama integration for local LLM support. It directly reflects the primary objective of the changeset.
Linked Issues check	✅ Passed	The PR implementation successfully addresses all primary objectives from issue #357: Ollama auto-detection, smart model selection, streaming support, offline operation, cloud fallback routing, multiple model support, and comprehensive documentation/tests.
Out of Scope Changes check	✅ Passed	All changes are directly related to Ollama integration objectives. Minor updates to pyproject.toml (license format) and AGENTS.md installation instructions are necessary ancillary changes supporting the core feature.
Docstring Coverage	✅ Passed	Docstring coverage is 89.87% which is sufficient. The required threshold is 80.00%.
Description check	✅ Passed	The pull request description is comprehensive and well-structured, covering all major changes including Ollama integration, documentation updates, robustness improvements, and testing instructions.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Copilot

Pull request overview

This PR adds comprehensive local LLM support to Cortex Linux via Ollama integration, enabling privacy-first, offline-capable package management without cloud API dependencies. The implementation includes automated setup, intelligent provider routing with fallback, and extensive documentation.

Key Changes

New OllamaProvider with full model management and inference capabilities
Updated LLM router to prioritize local Ollama over cloud providers (Claude/OpenAI)
Automated post-install setup script for seamless user experience
Comprehensive test suite covering provider functionality, routing, and fallback logic

Reviewed changes

Copilot reviewed 20 out of 20 changed files in this pull request and generated 12 comments.

Show a summary per file

File	Description
`tests/test_ollama_integration.py`	New test suite with 221 lines covering Ollama integration
`tests/test_llm_router.py`	Updated routing tests to accommodate Ollama-first priority
`tests/test_cli.py`	Modified CLI test to reflect API-key-optional behavior
`cortex/providers/ollama_provider.py`	New 439-line provider implementation for local LLM
`cortex/llm_router.py`	Enhanced routing with Ollama priority and smart fallback
`setup.py`	Added post-install hooks for automated Ollama setup
`scripts/setup_ollama.py`	259-line setup script for installation and configuration
`examples/ollama_demo.py`	Interactive demo showcasing Ollama capabilities
Documentation files	Comprehensive guides including quickstart and integration docs

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-12-24T18:51:57Z

tests/test_ollama_integration.py

+        from requests.exceptions import ConnectionError
+        mock_get.side_effect = ConnectionError("Connection refused")
+
+        provider2 = OllamaProvider()
+        self.assertFalse(provider2.is_running())


The test expects ConnectionError to be raised when mocking a failed connection, but the exception type is incorrect. The code should raise requests.exceptions.ConnectionError, not a generic Exception("Connection refused"). The mock is setting side_effect = ConnectionError("Connection refused"), but this should be requests.exceptions.ConnectionError("Connection refused") for proper exception handling.

tests/test_llm_router.py

Copilot · 2025-12-24T18:51:58Z

tests/test_llm_router.py

+        # System operation should try Ollama first, then fallback chain
        response = router.complete(
            messages=[{"role": "user", "content": "Install CUDA"}],
            task_type=TaskType.SYSTEM_OPERATION,
        )

-        self.assertEqual(response.provider, LLMProvider.CLAUDE)
-        self.assertEqual(response.content, "Fallback response")
+        # With Ollama integration, could be Ollama or any fallback provider
+        self.assertIn(response.provider, [LLMProvider.OLLAMA, LLMProvider.CLAUDE, LLMProvider.KIMI_K2])
+        # Check response content exists
+        self.assertIsNotNone(response.content)


The test expects Claude as the fallback provider but with Ollama integration, the actual provider returned is OLLAMA. The test needs to be updated to reflect the new fallback priority: Ollama → Claude → Kimi K2.

Copilot · 2025-12-24T18:51:58Z

tests/test_cli.py

    def test_install_no_api_key(self):
+        # With Ollama integration, no API key is needed - should succeed
        result = self.cli.install("docker")
-        self.assertEqual(result, 1)
+        self.assertEqual(result, 0)


The test expects a return code of 1 (failure) when no API key is provided, but with Ollama integration, the system no longer requires API keys and returns 0 (success). The test expectation should be updated to reflect that Ollama enables usage without API keys.

Copilot · 2025-12-24T18:51:59Z

cortex/providers/ollama_provider.py

+        try:
+            response = requests.get(
+                f"{self.base_url}/api/tags",
+                timeout=5
+            )
+            return response.status_code == 200
+        except requests.RequestException:
+            return False


The exception handling catches a broad Exception but the actual exception type raised by requests is ConnectionError. The code should specifically handle requests.exceptions.RequestException or requests.exceptions.ConnectionError for better error handling clarity.

Copilot · 2025-12-24T18:52:00Z

tests/test_ollama_integration.py

+        mock_ollama.complete.return_value = {
+            "response": "Install nginx using apt-get",
+            "model": "llama3:8b"
+        }
+        mock_ollama_class.return_value = mock_ollama
+
+        router = LLMRouter()
+        router.ollama_client = mock_ollama  # Ensure router uses our mock
+
+        messages = [{"role": "user", "content": "How to install nginx?"}]
+        response = router.complete(
+            messages=messages,
+            task_type=TaskType.SYSTEM_OPERATION,
+            force_provider=LLMProvider.OLLAMA
+        )
+
+        self.assertEqual(response.provider, LLMProvider.OLLAMA)
+        # Check that complete was called on the mock
+        mock_ollama.complete.assert_called_once()
+        self.assertIn("nginx", response.content.lower())


The test is not properly mocking the Ollama provider's complete method return value. The mock returns a MagicMock object, but the test expects the response content to be a string containing "nginx". The mock should return a dict with a "response" key containing the actual content string, not a MagicMock object.

Copilot · 2025-12-24T18:52:00Z

scripts/setup_ollama.py

+            return False
+
+    except subprocess.TimeoutExpired:
+        logger.warning("⚠️  Model download timed out - try again with: ollama pull {model_name}")


The format string contains a non-existent variable reference. The warning message uses {model_name} but this variable is not in scope at line 204. This will cause an error if the timeout branch is executed. The variable should be referenced from the function context or removed from the message.

Suggested change

logger.warning("⚠️ Model download timed out - try again with: ollama pull {model_name}")

logger.warning(f"⚠️ Model download timed out - try again with: ollama pull {model_name}")

Copilot · 2025-12-24T18:52:00Z

cortex/providers/ollama_provider.py

+    def __init__(
+        self,
+        base_url: str | None = None,
+        timeout: int = 300,
+        auto_pull: bool = True,
+    ):


The function definition uses Python 3.10+ union type syntax str | None but also includes type hints that may not be compatible with all Python versions. Consider using Optional[str] from typing for better backwards compatibility, or ensure the codebase consistently requires Python 3.10+.

Copilot · 2025-12-24T18:52:01Z

cortex/providers/ollama_provider.py

+                subprocess.run(["ollama", "serve"], 
+                             stdout=subprocess.DEVNULL, 
+                             stderr=subprocess.DEVNULL,
+                             start_new_session=True)
+                time.sleep(2)  # Give service time to start
+                return True


Using hardcoded sleep times for service startup is unreliable and can cause race conditions. The service might not be ready after 2 seconds on slower systems, or this could waste time on faster systems. Consider implementing a proper retry loop with health checks instead of fixed sleep durations.

Suggested change

subprocess.run(["ollama", "serve"],

stdout=subprocess.DEVNULL,

stderr=subprocess.DEVNULL,

start_new_session=True)

time.sleep(2) # Give service time to start

return True

subprocess.run(

["ollama", "serve"],

stdout=subprocess.DEVNULL,

stderr=subprocess.DEVNULL,

start_new_session=True,

)

# Wait for Ollama service to become ready instead of using a fixed sleep

start_time = time.time()

timeout_seconds = 30

health_url = "http://localhost:11434/api/tags"

while time.time() - start_time < timeout_seconds:

try:

response = requests.get(health_url, timeout=2)

if response.status_code == 200:

logger.info("✅ Ollama service is running")

return True

except requests.RequestException:

# Service not ready yet; keep retrying until timeout

pass

time.sleep(0.5)

logger.error("Ollama service did not become ready within the expected time")

return False

Copilot · 2025-12-24T18:52:01Z

tests/test_ollama_integration.py

+"""
+
+import unittest
+from unittest.mock import Mock, patch, MagicMock


Import of 'MagicMock' is not used.

Suggested change

from unittest.mock import Mock, patch, MagicMock

from unittest.mock import Mock, patch

coderabbitai

Actionable comments posted: 9

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)

cortex/kernel_features/hardware_detect.py (1)

346-377: Fix inconsistent model naming: all models must use Ollama's colon format (model:tag).

Lines 355, 360, 364, 370, and 374 use dash notation (e.g., llama3.1-70b-q4, mistral-7b), which is incompatible with Ollama's /api/pull API. Line 367 correctly uses colon notation (phi3:mini, llama3:8b). The OllamaProvider will fail to pull models with incorrect names.

Convert all model names to colon format:

llama3.1-70b-q4 → llama3.1:70b-q4

qwen2.5-72b-q4 → qwen2.5:72b-q4

codellama-34b-q4 → codellama:34b-q4

mistral-7b → mistral:7b

phi-2 → phi3:2.5b (or verify correct Ollama model name)

tinyllama-1.1b → tinyllama:1.1b

For NPU variants (line 374: phi-3-mini-npu, qwen2.5-1.5b-npu), verify if these exist in Ollama registry before including them.

README.md (1)

370-375: Project status section is inconsistent with this PR.

Line 373 lists "Ollama local model support" under "In Progress", but this PR implements full Ollama integration. Update the status to reflect the completed feature.
🔎 Proposed fix
 ### In Progress
 - [ ] Conflict resolution UI
 - [ ] Multi-step orchestration
-- [ ] Ollama local model support
 - [ ] MCP server integration
 - [ ] Snap/Flatpak support
+
+### Completed
+- [x] Natural language to package resolution
+- [x] Claude and OpenAI integration
+- [x] Installation history and rollback
+- [x] User preferences (YAML-backed)
+- [x] Hardware detection (GPU/CPU/Memory)
+- [x] Firejail sandboxing
+- [x] Dry-run preview mode
+- [x] Ollama local model support

cortex/llm_router.py (2)

521-544: Ollama stats missing from get_stats() return value.

The provider_stats dictionary tracks Ollama usage (line 203), but get_stats() only returns stats for Claude and Kimi K2, omitting Ollama entirely.

🔎 Proposed fix

         with self._stats_lock:
             return {
                 "total_requests": self.request_count,
                 "total_cost_usd": round(self.total_cost_usd, 4),
                 "providers": {
+                    "ollama": {
+                        "requests": self.provider_stats[LLMProvider.OLLAMA]["requests"],
+                        "tokens": self.provider_stats[LLMProvider.OLLAMA]["tokens"],
+                        "cost_usd": round(self.provider_stats[LLMProvider.OLLAMA]["cost"], 4),
+                    },
                     "claude": {
                         "requests": self.provider_stats[LLMProvider.CLAUDE]["requests"],
                         "tokens": self.provider_stats[LLMProvider.CLAUDE]["tokens"],
                         "cost_usd": round(self.provider_stats[LLMProvider.CLAUDE]["cost"], 4),
                     },
                     "kimi_k2": {
                         "requests": self.provider_stats[LLMProvider.KIMI_K2]["requests"],
                         "tokens": self.provider_stats[LLMProvider.KIMI_K2]["tokens"],
                         "cost_usd": round(self.provider_stats[LLMProvider.KIMI_K2]["cost"], 4),
                     },
                 },
             }

591-626: acomplete() doesn't support Ollama routing.

The async completion method only handles Claude and Kimi K2 providers (lines 592-595), but route_task() can return LLMProvider.OLLAMA. This will cause Ollama requests to incorrectly fall through to the Kimi K2 path.

🔎 Proposed fix

         try:
-            if routing.provider == LLMProvider.CLAUDE:
+            if routing.provider == LLMProvider.OLLAMA:
+                # Ollama doesn't have native async - run sync in executor
+                loop = asyncio.get_event_loop()
+                response = await loop.run_in_executor(
+                    None,
+                    lambda: self._complete_ollama(messages, temperature, max_tokens)
+                )
+            elif routing.provider == LLMProvider.CLAUDE:
                 response = await self._acomplete_claude(messages, temperature, max_tokens, tools)
             else:  # KIMI_K2
                 response = await self._acomplete_kimi(messages, temperature, max_tokens, tools)

Also update the fallback logic (lines 609-614) to include Ollama:

             if self.enable_fallback:
-                fallback_provider = (
-                    LLMProvider.KIMI_K2
-                    if routing.provider == LLMProvider.CLAUDE
-                    else LLMProvider.CLAUDE
-                )
+                if routing.provider == LLMProvider.OLLAMA:
+                    fallback_provider = (
+                        LLMProvider.CLAUDE if self.claude_client_async
+                        else LLMProvider.KIMI_K2 if self.kimi_client_async
+                        else None
+                    )
+                elif routing.provider == LLMProvider.CLAUDE:
+                    fallback_provider = (
+                        LLMProvider.KIMI_K2 if self.kimi_client_async
+                        else None
+                    )
+                else:  # KIMI_K2
+                    fallback_provider = (
+                        LLMProvider.CLAUDE if self.claude_client_async
+                        else None
+                    )
+                
+                if not fallback_provider:
+                    raise RuntimeError("No fallback provider available")

🧹 Nitpick comments (11)

docs/PR_OLLAMA_INTEGRATION.md (2)
60-62: Add language identifier to fenced code block.

The fallback logic code block is missing a language identifier. Consider adding text or mermaid for proper rendering.
🔎 Proposed fix
-```
+```text
 Ollama (local) → Claude → Kimi K2 → Error
</details>

---

`306-307`: **Format URLs as markdown links.**

Bare URLs should be properly formatted as markdown links for better readability and accessibility.



<details>
<summary>🔎 Proposed fix</summary>

```diff
-- **Discord:** https://discord.gg/uCqHvxjU83
-- **Email:** mike@cortexlinux.com
+- **Discord:** [https://discord.gg/uCqHvxjU83](https://discord.gg/uCqHvxjU83)
+- **Email:** [mike@cortexlinux.com](mailto:mike@cortexlinux.com)
docs/OLLAMA_INTEGRATION.md (1)
42-42: Specify language for code block.

The bash command should be wrapped in a code block with language specification for proper syntax highlighting.
🔎 Proposed fix
+```bash
 cortex install nginx
+```
cortex/llm/interpreter.py (1)
144-180: Verify error message and exception type consistency.

The refactored _call_ollama method has good integration with OllamaProvider, but there's a minor inconsistency:

Line 159 error message says "No Ollama models available. Run: ollama pull llama3:8b", but the code now uses codellama:7b as the default model (line 65). The error message should suggest pulling the default model for consistency.
🔎 Proposed fix
             model = ollama.ensure_model_available()
             if not model:
-                raise RuntimeError("No Ollama models available. Run: ollama pull llama3:8b")
+                raise RuntimeError("No Ollama models available. Run: ollama pull codellama:7b")
examples/ollama_demo.py (1)
21-22: Relative path assumption may fail in different execution contexts.

Line 22 uses sys.path.insert(0, '..') to add the parent directory to the Python path. This assumes the script is always run from the examples/ directory, which may not hold true if:

The script is run from the project root

The script is executed via absolute path from another directory

The script is installed as a package
🔎 Proposed fix

Use __file__ to determine the script's location reliably:
+import os
 import sys
 import time
 from rich.console import Console
 from rich.table import Table
 from rich.panel import Panel
 
 # Add parent directory to path
-sys.path.insert(0, '..')
+script_dir = os.path.dirname(os.path.abspath(__file__))
+parent_dir = os.path.dirname(script_dir)
+sys.path.insert(0, parent_dir)
scripts/setup_ollama.py (2)
40-76: Security consideration: Executing downloaded script from external source.

Lines 43-61 download and execute a shell script from https://ollama.com/install.sh. While this is the official Ollama installation method, it presents security risks:

No integrity verification: The script is executed without checksum validation

HTTPS trust: Relies entirely on HTTPS/TLS for authenticity

Arbitrary code execution: Downloaded content is piped to sh -c

This is acceptable since:

It's the official Ollama installation method documented by Ollama

Users can skip with CORTEX_SKIP_OLLAMA_SETUP=1

Error handling prevents installation failures from blocking Cortex setup

However, consider documenting this security trade-off in the code comments.
Suggested documentation addition
     logger.info("📦 Installing Ollama for local LLM support...")
     logger.info("   This enables privacy-first, offline package management")
+    
+    # Security note: This downloads and executes the official Ollama install script.
+    # Users can skip this by setting CORTEX_SKIP_OLLAMA_SETUP=1 environment variable.
+    # See: https://github.com/ollama/ollama for installation verification.
     
     try:
100-102: Fixed sleep duration may be insufficient or excessive.

Line 101 uses time.sleep(2) after starting the Ollama service. This arbitrary 2-second wait may:

Be insufficient on slower systems (service not ready yet)

Be excessive on fast systems (unnecessary delay)
🔎 Proposed fix

Implement a retry loop with timeout instead of fixed sleep:
         # Give it a moment to start
-        time.sleep(2)
-        logger.info("✅ Ollama service started")
-        return True
+        # Poll for service readiness
+        max_attempts = 10
+        for attempt in range(max_attempts):
+            time.sleep(1)
+            try:
+                import requests
+                response = requests.get("http://localhost:11434/api/tags", timeout=2)
+                if response.status_code == 200:
+                    logger.info("✅ Ollama service started")
+                    return True
+            except Exception:
+                pass
+        
+        logger.warning("⚠️  Service started but not responding yet")
+        return True  # Don't fail - service may still be starting
cortex/llm_router.py (2)
388-390: Token estimation is imprecise.

The word-count-based estimation (len(content.split()) * 1.3) is a rough approximation. Consider using Ollama's actual token count from the response if available, or document this limitation.
🔎 Proposed improvement
-        # Ollama doesn't provide token counts in the same way
-        # Estimate based on response length
-        tokens_used = len(content.split()) * 1.3  # Rough estimate
+        # Use Ollama's token counts if available, otherwise estimate
+        eval_count = response_data.get("eval_count", 0)
+        prompt_eval_count = response_data.get("prompt_eval_count", 0)
+        if eval_count or prompt_eval_count:
+            tokens_used = eval_count + prompt_eval_count
+        else:
+            # Fallback: estimate based on response length
+            tokens_used = int(len(content.split()) * 1.3)
319-354: Fallback logic could recurse indefinitely in edge cases.

If a provider fails repeatedly and fallback keeps selecting the same failing provider chain, this could cause deep recursion. The current logic mitigates this by using force_provider, but consider adding a recursion guard.
🔎 Proposed improvement

Add a _fallback_depth parameter to prevent unbounded recursion:
     def complete(
         self,
         messages: list[dict[str, str]],
         task_type: TaskType = TaskType.USER_CHAT,
         force_provider: LLMProvider | None = None,
         temperature: float = 0.7,
         max_tokens: int = 4096,
         tools: list[dict] | None = None,
+        _fallback_depth: int = 0,
     ) -> LLMResponse:
Then guard before recursing:
                 if fallback_provider:
+                    if _fallback_depth >= 2:
+                        raise RuntimeError("Max fallback depth exceeded")
                     logger.info(f"🔄 Attempting fallback to {fallback_provider.value}")

                     return self.complete(
                         messages=messages,
                         task_type=task_type,
                         force_provider=fallback_provider,
                         temperature=temperature,
                         max_tokens=max_tokens,
                         tools=tools,
+                        _fallback_depth=_fallback_depth + 1,
                     )
cortex/providers/ollama_provider.py (2)
263-285: Model pull timeout may be insufficient for large models.

The pull_model() method uses self.timeout (default 300 seconds) for downloading models. Models like deepseek-coder-v2:16b (10GB+) may require significantly longer on slower connections.
🔎 Proposed fix
     def pull_model(self, model_name: str) -> bool:
         """
         Pull a model from Ollama registry.
         
         Args:
             model_name: Name of model to pull
             
         Returns:
             True if successful, False otherwise
         """
         logger.info(f"📥 Pulling model: {model_name}")
         
         try:
             response = requests.post(
                 f"{self.base_url}/api/pull",
                 json={"name": model_name},
                 stream=True,
-                timeout=self.timeout
+                timeout=None  # No timeout for model downloads
             )
Alternatively, add a separate pull_timeout parameter to __init__.
374-398: Prompt format may not be optimal for all models.

The simple System:/User:/Assistant: format works for basic models but may not be optimal for instruction-tuned models that expect specific prompt templates (e.g., Llama 3's <|begin_of_text|> format, Mistral's [INST] format). Consider using Ollama's native chat API (/api/chat) instead.
🔎 Proposed improvement
     def complete(
         self,
         messages: list[dict[str, str]],
         model: str | None = None,
         temperature: float = 0.7,
         max_tokens: int = 4096,
         stream: bool = False,
     ) -> dict[str, Any] | Generator[dict[str, Any], None, None]:
         # ... existing setup code ...
         
-        # Convert messages to Ollama format
-        prompt = self._messages_to_prompt(messages)
-
         try:
             response = requests.post(
-                f"{self.base_url}/api/generate",
+                f"{self.base_url}/api/chat",
                 json={
                     "model": model,
-                    "prompt": prompt,
-                    "temperature": temperature,
+                    "messages": messages,  # Native OpenAI-compatible format
                     "options": {
+                        "temperature": temperature,
                         "num_predict": max_tokens,
                     },
                     "stream": stream,
                 },
This uses Ollama's /api/chat endpoint which natively understands OpenAI-style message formats.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c6dd6e0 and 1bed948.

📒 Files selected for processing (20)

CHANGELOG.md
README.md
cortex/first_run_wizard.py
cortex/kernel_features/hardware_detect.py
cortex/llm/interpreter.py
cortex/llm_router.py
cortex/providers/__init__.py
cortex/providers/ollama_provider.py
docs/OLLAMA_IMPLEMENTATION_SUMMARY.md
docs/OLLAMA_INTEGRATION.md
docs/OLLAMA_QUICKSTART.md
docs/PR_OLLAMA_INTEGRATION.md
examples/ollama_demo.py
examples/sample-config.yaml
scripts/setup_ollama.py
setup.py
test_output.txt
tests/test_cli.py
tests/test_llm_router.py
tests/test_ollama_integration.py

🧰 Additional context used

📓 Path-based instructions (3)

**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

**/*.py: Follow PEP 8 style guide
Type hints required in Python code
Docstrings required for all public APIs

Files:

cortex/providers/__init__.py
tests/test_cli.py
cortex/kernel_features/hardware_detect.py
cortex/llm/interpreter.py
cortex/first_run_wizard.py
tests/test_llm_router.py
scripts/setup_ollama.py
examples/ollama_demo.py
cortex/llm_router.py
cortex/providers/ollama_provider.py
tests/test_ollama_integration.py
setup.py

{setup.py,setup.cfg,pyproject.toml,**/__init__.py}

📄 CodeRabbit inference engine (AGENTS.md)

Use Python 3.10 or higher as the minimum supported version

Files:

cortex/providers/__init__.py
setup.py

tests/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Maintain >80% test coverage for pull requests

Files:

tests/test_cli.py
tests/test_llm_router.py
tests/test_ollama_integration.py

🧬 Code graph analysis (9)

cortex/providers/__init__.py (1)

cortex/providers/ollama_provider.py (1)

OllamaProvider (30-439)

tests/test_cli.py (1)

cortex/cli.py (1)

install (297-525)

cortex/llm/interpreter.py (1)

cortex/providers/ollama_provider.py (4)

OllamaProvider (30-439)

is_running (138-152)

start_service (154-191)

ensure_model_available (287-311)

cortex/first_run_wizard.py (2)

setup.py (2)

run (10-18)

run (23-31)

mcp/cortex_mcp_server.py (1)

run (240-244)

tests/test_llm_router.py (1)

cortex/llm_router.py (4)

route_task (208-268)

TaskType (35-45)

LLMProvider (48-53)

complete (270-354)

scripts/setup_ollama.py (3)

cortex/providers/ollama_provider.py (1)

install_ollama (88-136)

cortex/logging_system.py (2)

info (198-200)

warning (202-204)

setup.py (2)

run (10-18)

run (23-31)

examples/ollama_demo.py (2)

cortex/providers/ollama_provider.py (6)

is_installed (78-85)

is_running (138-152)

start_service (154-191)

get_available_models (193-216)

ensure_model_available (287-311)

complete (313-372)

cortex/llm_router.py (3)

TaskType (35-45)

complete (270-354)

route_task (208-268)

cortex/providers/ollama_provider.py (3)

scripts/setup_ollama.py (1)

install_ollama (26-76)

cortex/logging_system.py (2)

info (198-200)

warning (202-204)

cortex/llm_router.py (1)

complete (270-354)

tests/test_ollama_integration.py (2)

cortex/providers/ollama_provider.py (2)

is_installed (78-85)

complete (313-372)

cortex/llm_router.py (4)

LLMProvider (48-53)

TaskType (35-45)

route_task (208-268)

complete (270-354)

🪛 GitHub Actions: CI

cortex/llm/interpreter.py

[error] 148-148: ruff: W293 Blank line contains whitespace.

🪛 GitHub Check: lint

cortex/llm/interpreter.py

[failure] 175-175: Ruff (W293)
cortex/llm/interpreter.py:175:1: W293 Blank line contains whitespace

[failure] 166-166: Ruff (W293)
cortex/llm/interpreter.py:166:1: W293 Blank line contains whitespace

[failure] 160-160: Ruff (W293)
cortex/llm/interpreter.py:160:1: W293 Blank line contains whitespace

[failure] 156-156: Ruff (W293)
cortex/llm/interpreter.py:156:1: W293 Blank line contains whitespace

[failure] 151-151: Ruff (W293)
cortex/llm/interpreter.py:151:1: W293 Blank line contains whitespace

[failure] 148-148: Ruff (W293)
cortex/llm/interpreter.py:148:1: W293 Blank line contains whitespace

[failure] 221-221: Ruff (W293)
cortex/llm/interpreter.py:221:1: W293 Blank line contains whitespace

[failure] 218-218: Ruff (W293)
cortex/llm/interpreter.py:218:1: W293 Blank line contains whitespace

[failure] 210-210: Ruff (W293)
cortex/llm/interpreter.py:210:1: W293 Blank line contains whitespace

[failure] 207-207: Ruff (W293)
cortex/llm/interpreter.py:207:1: W293 Blank line contains whitespace

🪛 GitHub Check: Lint